Keynote Speaker
Carl Yang
Expediting Next-Generation AI for Health via KG and LLM Co-Learning
Abstract
Large language models (LLM) have brought disruptive progress to information technology from accessing data to performing
analytical tasks. While demonstrating unprecedented capabilities, LLMs have been found unreliable in tasks requiring
factual knowledge and rigorous reasoning, posing critical challenges in domains such as healthcare. Knowledge graphs (KG)
have been widely used for explicitly organizing and indexing biomedical knowledge, but the quality and coverage of KG are
hard to scale up given the notoriously complex and noisy healthcare data with multiple modalities from multiple institutions.
Existing approaches show promises in combining LLMs and KGs to enhance each other, but they do not study the techniques in
real healthcare contexts and scenarios. In this talk, I will introduce our research vision and agenda towards KG-LLM co-learning
for healthcare, followed by success examples from our recent exploration on LLM-aided KG construction, KG-guided LLM enhancement,
and federated multi-agent systems. I will conclude the talk with discussions on future directions that can benefit from further
collaborations with researchers interested in data mining or biomedical informatics in general.
Bio
Carl Yang is an Assistant Professor of Computer Science at Emory University, jointly appointed at the Department of Biostatistics
and Bioinformatics in the Rollins School of Public Health and the Center for Data Science in the Nell Hodgson Woodruff School of
Nursing. He received his Ph.D. in Computer Science at University of Illinois, Urbana-Champaign in 2020, and B.Eng. in Computer
Science and Engineering at Zhejiang University in 2014. His research interests span graph data mining, applied machine learning,
knowledge graphs and federated learning, with applications in recommender systems, social networks, neuroscience and healthcare.
Carl's research results have been published in 150+ peer-reviewed papers in top venues across data mining and biomedical informatics.
He is also a recipient of the Dissertation Completion Fellowship of UIUC in 2020, the Best Paper Award of ICDM in 2020, the Best
Paper Award of KDD Health Day in 2022, the Best Paper Award of ML4H in 2022, the Amazon Research Award in 2022, the Microsoft
Accelerating Foundation Models Research Award in 2023, and multiple Emory internal research awards. Carl's research receives
funding support from both NSF and NIH of USA.
Invited Speaker
Jiabin Tang
Graph Language Models
Abstract
In the realm of graph-based research, understanding and leveraging graph structures has become increasingly important, given their
wide range of applications in network analysis, bioinformatics and urban science. Graph Neural Networks (GNNs) and their heterogeneous
counterparts (HGNNs) have emerged as powerful tools for capturing the intricate relationships within graph data. However, despite their
advancements, these models often struggle with generalization in zero-shot learning scenarios and across diverse heterogeneous graph
datasets, especially in the absence of abundant labeled data for fine-tuning. Addressing these challenges, we recently introduce two
novel frameworks, i.e., “GraphGPT: Graph Instruction Tuning for Large Language Models” and “HiGPT: Heterogeneous Graph Language Model”,
which are designed to enhance the adaptability and applicability of graph models in various contexts. GraphGPT presents a pioneering
approach by integrating Large Language Models (LLMs) with graph structural knowledge through a graph instruction tuning paradigm.
This model leverages a text-graph grounding component and a dual-stage instruction tuning process, incorporating self-supervised
graph structural signals and task-specific instructions. This technique enables the model to comprehend complex graph structures and
achieve remarkable generalization across different tasks without the need for downstream graph data. On the other hand, HiGPT focuses
on heterogeneous graph learning by introducing a heterogeneous graph instruction-tuning paradigm that eliminates the need for
dataset-specific fine-tuning. It features an in-context heterogeneous graph tokenizer and employs a large corpus of heterogeneity-aware
graph instructions, complemented by a Mixture-of-Thought (MoT) instruction augmentation strategy. This allows HiGPT to adeptly handle
distribution shifts in node token sets and relation type heterogeneity, thereby significantly improving its generalization capabilities
across various learning tasks.
Bio
Jiabin Tang is a first-year Ph.D. student majoring in Data Science at The University of Hong Kong (HKU), supervised by Prof. Chao Huang
and Kao, Benjamin C.M. His research interests lie in 1) large language models and other AIGC techniques; 2) graph learning and trustworthy
machine learning; 3) related deep learning applications, e.g., spatio-temporal data mining and recommendation. He has published some papers
at the top international AI conferences such as KDD, SIGIR, CIKM, WWW. He is the leading author of GraphGPT (SIGIR 2024) and HiGPT (KDD 2024),
as well as a co-author of LLMRec (Most Influential Paper at WSDM 2024) and UrbanGPT (KDD 2024). His GraphGPT is ranked among the top three most
influential papers at the prestigious conference SIGIR 2024. It has been cited over 70 times and has garnered significant attention in the
open-source community on GitHub, receiving over 510 stars.